class: center, middle, inverse, title-slide #
Practical Issues
## .big[🎨 🔬 📝] ### Applied Machine Learning in R Pittsburgh Summer Methodology Series ### Lecture 4-B July 22, 2021 --- class: inverse, center, middle # Overview <style type="text/css"> .onecol { font-size: 26px; } .twocol { font-size: 24px; } .remark-code { border: 1px solid grey; } a { background-color: lightblue; } .remark-inline-code { background-color: white; } </style> --- ## Lecture Topics </br></br> <img src="data:image/png;base64,#researchworkflow.png" width="100%" height="115%" /> --- class: inverse, center, middle # Designing Studies --- class: onecol ## Back to Basics The goal of **supervised machine learning** is to  unknown values of important variables in new data. We are less concerned about statistical **inference** (e.g., which features are significantly related to the outcome). Features used to predict tend to be **cheaper or easier to measure** in new data. Labels to be predicted tend to be **expensive or difficult to measure** in new data. ML models can be used for  or  problems. --- class: onecol ## Asking Predictive Questions As psychologists (and other social and behavioral scientists), we want to **explain** and **predict** behavior. An implicit assumption is that better explanation will lead to better prediction. However, statistically, this is not always the case (e.g., due to ). Shifting from an explanatory/inferential mindset ("what are the causal mechanisms underlying these behaviors?") to a predictive mindset ("how can we best forecast future behaviors we haven't observed?") isn't easy! .footnote[ I highly recommend reading Yarkoni & Westfall (2017), Choosing Prediction over Explanation in Psychology: Lessons from Machine Learning. https://journals.sagepub.com/doi/10.1177/1745691617693393 ] --- class: twocol ## Asking Predictive Questions Question: Can we infer people's personalities from their social media usage? - **Inferential mindset**: test for statistically significant relationships between personality dimensions and other variables (e.g., ratings of someone's Facebook profile). - **Predictive mindset**: build a ML model with the primary goal of predicting someone's scores on a personality questionnaire from social media data. Question: Can we understand how likely someone is to recover from an anxiety disorder? - **Inferential mindset**: identify variables at time 1 that have a statistically significant relationship with recovery at time 2. - **Predictive mindset**: build an ML model with the goal of using time 1 data to accurately predict anxiety scores at time 2. --- class: onecol ## Cause and Effect Machine learning is a **data-driven** approach. However, this does not mean that it is **atheoretical**. -- Strong understanding of the **underlying causal structure** of labels is crucial for optimizing model performance and feature selection. Models with features that are  of the outcome have higher predictive accuracy in future datasets than models with features that are  of the outcome.<sup>1</sup> Design of ML studies should be driven by strong **theory**. .footnote[ [1] See Piccininni et al. (2020) for theoretical explanation and simulation results: https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-020-01058-z ] --- class: onecol ## Timescale of Effects The **timeframe** of studies should also be theoretically motivated. Over what timescale are the features and labels expected to change? Over what timescale is the feature expected to lead to change in the label? -- <img src="data:image/png;base64,#franklin2017.png" width="53%" /> --- class: onecol ## Measurement *"Throwing the same set of poorly measured variables that have been analyzed before into machine learning algorithms is highly unlikely to produce new insights or findings.<sup>1</sup>"* <sup>2</sup> are vast and include things like unclear definitions of constructs, lack of reliability and validity of measures, and using scales in ways they were not intended. Measurement error can prevent ML algorithms from accurately capturing and . .footnote[ [1] Jacobucci & Grimm (2020); *Perspectives on Psychological Science* </br> [2] Flake & Fried (2020); *Advances in Methods and Practices in Psychological Science* ] --- class: onecol ## Measurement <img src="data:image/png;base64,#qmps.png" width="85%" /> .footnote[ Flake & Fried (2020); *Advances in Methods and Practices in Psychological Science* ] --- class: onecol ## Sample Size One of the most common questions we hear is "how much data do I need for ML"? There's no straightforward or universal answer, and this is an active area of research. However, here are some important principles and . -- </br> When working with , use . Use  with inner and outer loops for smaller samples rather than a single, held-out test set.<sup>1</sup> Aim for `\(N \geq p*10\)`, with a minimum of 30 observations per test set. .footnote[ [1] See Kuhn & Johnson (2013), section 4.7 for more details. ] --- class: onecol ## Bias and Representativeness Though often heralded as 'objective', algorithms reflect the nature of the data used to train them, and can . Recidivism algorithms are biased against Black defendants, chatbots trained on internet data produce sexist & racist responses, facial recognition works better for White people, and STEM advertisements are less likely to be displayed to women than men. -- .pull-left[ <img src="data:image/png;base64,#huggingface_man.png" width="62%" height="63%" style="display: block; margin: auto 0 auto auto;" /> ] .pull-right[ <img src="data:image/png;base64,#huggingface_woman.png" width="65%" height="65%" style="display: block; margin: auto auto auto 0;" /> ] --- class: onecol ## Bias and Representativeness Training a ML model with  will produce . When designing a study and collecting data, pay attention to who you are including in your sample, and who is being left out. Critically evaluate the features and labels you're collecting. Are they  for all groups of people? Are they accurate and sensitive at assessing the label of interest for everyone? --- class: onecol ## Group Discussion We will randomly assign you to a small breakout room. We will jump between rooms to join discussions and answer questions. **Introduce yourselves again and discuss the following topics:** 1. What types of ML studies are you interested in designing in your field? How will you shift from an inferrential to a predictive mindset? 2. What problems, concerns, or challenges do you forsee in designing ML studies (e.g., in causal thinking, sample size, measurement, bias)? 3. What are some potential solutions? --- class: inverse, center, middle # Modeling Decisions --- ## Choosing between algorithms Algorithm    | Benefits | Drawbacks :------- | :-------- | :------- Ridge | handles multicollinearity; shrinks correlated features towards each other | does not perform feature selection; does not model nonlinearity Lasso | handles multicollinearity; performs feature selection | tends to pick one correlated feature and reduce the other to zero; does not model nonlinearity Elastic Net | Ridge-like regression with lasso-like feature selection | does not model nonlinearity Decision Trees | easily interpretable; models nonlinearity | unstable; poor prediction in new datasets (not often used in practice) Random Forests | models nonlinearity, good prediction in new data | not easily interpretable, requires larger sample sizes Support Vector Machines | can handle `\(p>n\)`; models nonlinearity | not easily interpretable; can be difficult to choose a 'good' kernel function --- class: twocol ## Algorithm assumptions Algorithm    | Assumptions :------- | :-------- Regularized regression (ridge, lasso, elastic net) | Linear relationship between features and outcome. Features should be on the same scale (normalize!). Dummy code nominal features. Decision Trees and Random Forests | No formal assumptions; decision trees and random forests are non-parametric. Features do not have to be on the same scale. One-hot encoding of nominal features. Support Vector Machines | Features should be on the same scale (normalize!). Dummy code nominal features. </br> Note: unlike familiar statistical methods (e.g., linear regression models), these methods have no distributional assumptions about error terms. --- class: onecol ## Reproducibility In order to have **fully reproducible** models, you often need to set  within {caret}'s `trainControl()` function. The number of seeds to produce is `\(B+1\)`, where `\(B\)` is the number of resamples. E.g., for 10-fold cross-validation repeated 3 times, you will need 31 seeds. .scroll-output[ ```r set.seed(2021) seeds <- vector(mode = "list", length = 30) # length = n data splits for(i in 1:30) seeds[[i]] <- sample.int(1000, 6) # 6 = tune length seeds[[31]] <- sample.int(1000, 1) # for the last model seeds ``` ``` ## [[1]] ## [1] 903 166 430 442 743 908 ## ## [[2]] ## [1] 70 192 934 763 614 622 ## ## [[3]] ## [1] 325 495 103 361 535 332 ## ## [[4]] ## [1] 947 956 146 123 188 637 ## ## [[5]] ## [1] 387 814 360 538 164 101 ## ## [[6]] ## [1] 342 671 752 354 403 580 ## ## [[7]] ## [1] 373 854 645 73 191 294 ## ## [[8]] ## [1] 274 683 827 754 70 342 ## ## [[9]] ## [1] 710 703 79 674 150 626 ## ## [[10]] ## [1] 400 335 881 600 529 742 ## ## [[11]] ## [1] 357 148 452 990 456 467 ## ## [[12]] ## [1] 67 642 785 360 681 121 ## ## [[13]] ## [1] 799 511 369 713 989 155 ## ## [[14]] ## [1] 504 395 369 932 934 622 ## ## [[15]] ## [1] 126 463 722 585 531 528 ## ## [[16]] ## [1] 482 869 639 362 510 758 ## ## [[17]] ## [1] 300 790 550 120 27 812 ## ## [[18]] ## [1] 781 801 430 661 883 601 ## ## [[19]] ## [1] 95 685 75 905 535 235 ## ## [[20]] ## [1] 982 420 606 302 432 775 ## ## [[21]] ## [1] 285 385 659 877 625 420 ## ## [[22]] ## [1] 894 713 115 625 329 650 ## ## [[23]] ## [1] 632 321 123 293 427 925 ## ## [[24]] ## [1] 848 197 187 659 844 570 ## ## [[25]] ## [1] 669 7 320 172 569 247 ## ## [[26]] ## [1] 139 579 302 20 663 557 ## ## [[27]] ## [1] 513 655 373 992 215 155 ## ## [[28]] ## [1] 109 321 371 900 961 13 ## ## [[29]] ## [1] 710 606 85 302 551 896 ## ## [[30]] ## [1] 446 653 165 624 458 651 ## ## [[31]] ## [1] 501 ``` ] --- class:onecol ## Reproducibility This `seeds` object can be included in the `trainControl()` function to obtain the same resampling seeds each time the code is run, for fully reproducible results. ```r model_control <- trainControl(method = 'repeatedcv', number = 10, repeats = 3, seeds = seeds) ``` --- class: onecol ## Resampling Methods There are many resampling methods available: held-out validation set, cross-validation, repeated cross-validation, leave-one-out cross-validation, and bootstrapping. Which approach is best? -- There is no one-size-fits-all answer, but in *general*, 5-fold and 10-fold cross-validation have been shown to provide a good compromise for the bias-variance tradeoff problem. Using a single held-out validation set -- or LOOCV at the other extreme -- can suffer from high . LOOCV also becomes computationally infeasible with large datasets. Bootstrapping tends to have significant pessimistic . When sample sizes are small,  is a good option. --- class: onecol ## Nested Cross-Validation <img src="data:image/png;base64,#nestedcv1.png" width="72%" /> --- class: onecol ## Nested Cross-Validation <img src="data:image/png;base64,#nestedcv2.png" width="72%" /> --- class: inverse, center, middle # Interpreting Results --- class: onecol ## What's "good" accuracy? Not all performance metrics are equally informative for all prediction problems. E.g., is it more important to detect true positives (sensitivity/recall) than avoid false negatives (specificity)? This may differ across different modeling problems. -- <img src="data:image/png;base64,#ppv_suicide.png" width="50%" /> --- class: onecol ## Can you trust your model? The predictive modeling process assumes that the same underlying  that generated the current features and outcomes will continue to generate data from the same mechanisms. If new data are generated by different mechanisms (or if your training dataset was too small), your model may not perform well in the future. It's difficult to know whether this will be the case during the ML model training and tuning process. External validation is always desirable, whenever possible. --- class: onecol ## Importance of External Validation ML models have been very popular in predicting COVID-19 outcomes. However, when submitting 22 published ML models to external validation, *none* beat simple univariate predictors. <img src="data:image/png;base64,#covid.png" width="50%" /> --- class: inverse, center, middle # Writing --- class: onecol ## What to Include in Methods Writing methods sections for ML papers can feel different than "traditional" studies with an inferential statistics framework. ML is a relatively new method in the social and behavioral sciences, so the Methods section may require more detail/explanation/elaboration of model training, tuning, resampling, and evaluation processes<sup>1</sup>. Ideally, an informed reader and ML practitioner (a.k.a. all of you!) should be able to  models in papers from their Methods section. .footnote[ [1] This can differ in degree based on your specific field/subfield and which journals you submit to. ] --- class: onecol ## What to Include in Methods Here's a non-exhaustive list of steps to describe: - Feature selection - Feature engineering/preprocessing - Resampling methods (e.g., 10-fold cross-validation) - Specific algorithm(s) used and why - Evaluation metric(s) used and why - Model comparison - Variable importance - External validation (if applicable) --- class: twocol ## Methods Example: Setting the Stage It's common to use both inferential statistics and predictive modeling in the same paper. Clearly distinguishing between the two is helpful: <img src="data:image/png;base64,#girard1.png" width="75%" /> <img src="data:image/png;base64,#girard2.png" width="75%" /> .footnote[ From Girard et al. (2021): https://psyarxiv.com/zc47p ] --- class: twocol ## Methods Example: Algorithm Providing some justification and (brief) explanation of each algorithm is useful. .pull-left[ <img src="data:image/png;base64,#girard3.png" width="100%" /> ] .pull-right[ <img src="data:image/png;base64,#wang_psychmed.png" width="80%" /> ] .footnote[ From Girard et al. (2021): https://psyarxiv.com/zc47p and Haynos et al. (2020): https://doi.org/10.1017/S0033291720000227 ] --- class: twocol ## Methods Example: Resampling State (in precise terms) your resampling methods, making sure to clearly describe the separation of model training and testing. <img src="data:image/png;base64,#girard4.png" width="75%" /> .footnote[ From Girard et al. (2021): https://psyarxiv.com/zc47p ] --- class: twocol ## Methods Example: Performance Metrics Describe your performance metrics, including your primary model evaluation metric used during hyperparameter tuning. Provide some information about how to interpret these metrics, as they may be unfamiliar to your audience. <img src="data:image/png;base64,#wang2021.png" width="75%" /> .footnote[ From Wang et al. (2021): https://doi.org/10.1001/jamanetworkopen.2021.0591 ] --- class: onecol ## What to Include in Results Writing results sections for ML papers can also feel different and confusing compared to traditional inferential statistical models! Here's a non-exhaustive list of results to include: - Sample size for each model - Performance of each model - Some measure of uncertainty/variance around model accuracy - Comparison between models (or to a baseline/chance prediction) - Variable importance --- class: twocol ## Results Example: Model Performance and Comparison State results from each model and comparisons (if applicable). <img src="data:image/png;base64,#girard5.png" width="75%" /> .footnote[ From Girard et al. (2021): https://psyarxiv.com/zc47p ] --- class: twocol ## Results Example: Model Performance and Comparison State results from each model and comparisons (if applicable). <img src="data:image/png;base64,#wang_results.png" width="50%" /> .footnote[ From Wang et al. (2021): https://doi.org/10.1001/jamanetworkopen.2021.0591 ] --- class: twocol ## Results Example: Uncertainty/Variance Plotting results from each CV fold can help show variance around model performance metrics. <img src="data:image/png;base64,#wang_cvplot.png" width="48%" /> .footnote[ From Wang et al. (2021): https://doi.org/10.1001/jamanetworkopen.2021.0591 ] --- class: onecol ## Publication-Ready Figures .left-column[ </br> <img src="data:image/png;base64,#ggplot.png" width="100%" /> ] .right-column[ We strongly recommend {ggplot2} for making publication-ready figures. The `qplot()` function is a convenient shortcut for making ggplots for folks who are used to base `plot()` in R. But in the long run, learning {ggplot2} is really useful for allowing you to make more complex, customizable, publication-ready figures. ] --- class: twocol ## The Grammar of Graphics: Basic Elements .pull-left[ **Data** </br><font size="5">describe observations with variables</font> **Aesthetic Mappings** </br><font size="5">map data variables to visual qualities</font> **Scales** </br><font size="5">map values in data space to values in aesthetic space (create axes and legends) </font> **Geometric Objects** </br><font size="5">constitute the objects seen on a plot</font> ] --- class: twocol ## The Grammar of Graphics: Basic Elements .pull-left[  </br><font size="5">describe observations with variables</font> **Aesthetic Mappings** </br><font size="5">map data variables to visual qualities</font> **Scales** </br><font size="5">map values in data space to values in aesthetic space (create axes and legends) </font> **Geometric Objects** </br><font size="5">constitute the objects seen on a plot</font> ] .pull-right[ <table style='width:10%;'> <thead> <tr> <th style="text-align:right;"> Sepal.Length </th> <th style="text-align:right;"> Sepal.Width </th> <th style="text-align:left;"> Species </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5.1 </td> <td style="text-align:right;"> 3.5 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.9 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 3.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.6 </td> <td style="text-align:right;"> 3.1 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 3.6 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 5.4 </td> <td style="text-align:right;"> 3.9 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.6 </td> <td style="text-align:right;"> 3.4 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 3.4 </td> <td style="text-align:left;"> setosa </td> </tr> </tbody> </table> ] --- class: twocol ## The Grammar of Graphics: Basic Elements .pull-left[ **Data** </br><font size="5">describe observations with variables</font>  </br><font size="5">map data variables to visual qualities</font> **Scales** </br><font size="5">map values in data space to values in aesthetic space (create axes and legends) </font> **Geometric Objects** </br><font size="5">constitute the objects seen on a plot</font> ] .pull-right[ </br> <img src="data:image/png;base64,#aesggplot.png" width="100%" /> ] --- class: twocol ## The Grammar of Graphics: Basic Elements .pull-left[ **Data** </br><font size="5">describe observations with variables</font> **Aesthetic Mappings** </br><font size="5">map data variables to visual qualities</font>  </br><font size="5">map values in data space to values in aesthetic space (create axes and legends) </font> **Geometric Objects** </br><font size="5">constitute the objects seen on a plot</font> ] .pull-right[ ```r ggplot() ``` <img src="data:image/png;base64,#Day_4B_Slides_files/figure-html/unnamed-chunk-24-1.png" width="100%" /> ] --- class: twocol ## The Grammar of Graphics: Basic Elements .pull-left[ **Data** </br><font size="5">describe observations with variables</font> **Aesthetic Mappings** </br><font size="5">map data variables to visual qualities</font>  </br><font size="5">map values in data space to values in aesthetic space (create axes and legends) </font> **Geometric Objects** </br><font size="5">constitute the objects seen on a plot</font> ] .pull-right[ ```r ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) ``` <img src="data:image/png;base64,#Day_4B_Slides_files/figure-html/unnamed-chunk-25-1.png" width="100%" /> ] --- class: twocol ## The Grammar of Graphics: Basic Elements .pull-left[ **Data** </br><font size="5">describe observations with variables</font> **Aesthetic Mappings** </br><font size="5">map data variables to visual qualities</font> **Scales** </br><font size="5">map values in data space to values in aesthetic space (create axes and legends) </font>  </br><font size="5">constitute the objects seen on a plot</font> ] .pull-right[ ```r ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point() ``` <img src="data:image/png;base64,#Day_4B_Slides_files/figure-html/unnamed-chunk-26-1.png" width="100%" /> ] --- class: twocol ## The Grammar of Graphics: Basic Elements .pull-left[ **Data** </br><font size="5">describe observations with variables</font> **Aesthetic Mappings** </br><font size="5">map data variables to visual qualities</font> **Scales** </br><font size="5">map values in data space to values in aesthetic space (create axes and legends) </font>  </br><font size="5">constitute the objects seen on a plot</font> ] .pull-right[ ```r ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point(aes(shape=Species)) + geom_smooth(method = "lm") ``` <img src="data:image/png;base64,#Day_4B_Slides_files/figure-html/unnamed-chunk-27-1.png" width="100%" /> ] .footnote[ Geoms can be combined and layered in interesting ways. ] --- class: twocol ## The Grammar of Graphics: Basic Elements .pull-left[ **Data** </br><font size="5">describe observations with variables</font> **Aesthetic Mappings** </br><font size="5">map data variables to visual qualities</font> **Scales** </br><font size="5">map values in data space to values in aesthetic space (create axes and legends) </font> **Geometric Objects** </br><font size="5">constitute the objects seen on a plot</font> **Themes**</br> <font size="5">control non-data elements of a graphic</font> ] .pull-right[ ```r ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point(aes(shape=Species)) + geom_smooth(method = "lm") + theme_bw() ``` <img src="data:image/png;base64,#Day_4B_Slides_files/figure-html/unnamed-chunk-28-1.png" width="100%" /> ] --- class: twocol ## The Grammar of Graphics: Basic Elements .pull-left[ **Data** </br><font size="5">describe observations with variables</font> **Aesthetic Mappings** </br><font size="5">map data variables to visual qualities</font> **Scales** </br><font size="5">map values in data space to values in aesthetic space (create axes and legends) </font> **Geometric Objects** </br><font size="5">constitute the objects seen on a plot</font> **Themes**</br> <font size="5">control non-data elements of a graphic</font> ] .pull-right[ ```r ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point(aes(shape=Species)) + geom_smooth(method = "lm") + theme_dark() ``` <img src="data:image/png;base64,#Day_4B_Slides_files/figure-html/unnamed-chunk-29-1.png" width="100%" /> ] --- class: onecol ## {ggplot} To learn more about {ggplot}, check out some of these free resources online (there are many more tutorials/materials too)! .left-column[ <img src="data:image/png;base64,#ggplotbook.jpg" width="100%" /> ] .right-column[ - https://ggplot2.tidyverse.org/ - https://ggplot2-book.org/index.html - https://r-graphics.org/index.html - https://jmgirard.com/data-viz/ - https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf ] --- class: inverse, center, middle # Peer-Reviewing ML Papers --- class: onecol ## Peer-Reviewing ML Papers The machine learning literature in psychology and other social and behavioral sciences is . This can feel like a double-edged sword. On the one hand, an increased focus on prediction is so important for solving many problems we care about. On the other, some published papers may not be high-quality, or people may rush to publish before fully understanding their models. Either way, this means there are  As responsible practitioners of ML, you should be well-equipped to review such papers with a focus on methods and results (and interpretation of results). --- class: onecol ## Is the Sample Appropriate? .left-column[ </br> <img src="data:image/png;base64,#sample.png" width="100%" /> ] .right-column[ - Population to sample match - Representativeness of the sample - Who is included? - Who is left out? - Adequate sample size (for the specific modeling approaches/algorithms employed) - Match between sample and research question - Generalizable to future use cases ] --- class: onecol ## Measurement and Feature Engineering .left-column[ </br> <img src="data:image/png;base64,#engineer.jpg" width="100%" /> ] .right-column[ - How were features and outcomes measured? - Justification of measurement decisions - Rationale for feature engineering - Clear description of feature engineering (e.g., one-hot vs dummy coding for categorical features) - Appropriate feature engineering to match specific algorithms (e.g., normalizing features for regularized regression) - Clear description of missing data and missing data handling (e.g., listwise deletion vs imputation) ] --- class: onecol ## Resampling and Data Leakage .left-column[ </br> <img src="data:image/png;base64,#leak.png" width="100%" /> ] .right-column[ - Clear separation between model training vs. evaluation - Is there any evidence of data leakage (i.e., using information from the test set to guide modeling decisions)? - Adequate size of test sets during resampling - Justification of resampling methods - Understanding of limitations of resampling methods - Justification of algorithm selection - Appropriate handling of multilevel/nested data ] --- class: onecol ## Model Evaluation and Interpretation .left-column[ </br> <img src="data:image/png;base64,#eval.png" width="100%" /> ] .right-column[ - Are the authors' interpretations and claims supported by the modeling methods and algorithms? - Are evaluation metrics differentiated appropriately (e.g., accuracy vs. AUROC vs. sensitivity vs. specificity)? - Appropriate interpretation of performance - Not overstating performance - Limitations adequately stated ] --- ## Comprehension check <font size="5">**Identify the problems with each sentence in a paper you're reviewing.**<font size="5"> .pull-left[ <font size="5">**Question 1**<font size="5"> "*We used ridge regularization to detect and model complex nonlinear relationships*" a) Ridge is not a useful predictive algorithm. b) Ridge cannot include any nonlinear terms. c) Ridge isn't well-suited for modeling non-linearities; other algorithms are better. d) Ridge is not a machine learning method. ] .pull-right[ <font size="5">**Question 2**<font size="5"> "*Our model provided excellent predictions, with an AUC of 0.68.*" a) AUC is not a good performance metric. b) An AUC of 0.68 isn't very good. c) An AUC of 0.68 is not within the range of possible AUC values. c) AUC does not provide useful information about predictive accuracy. ] --- class: onecol ## Group Discussion We will randomly assign you to a small breakout room. We will jump between rooms to join discussions and answer questions. **Discuss the following topics:** 1. Which ML algorithms are you most interested in applying to your data, and why? Do you anticipate any challenges? 2. Have you written any ML papers (or are you planning to)? What information do you find helpful to include in the methods and results? 3. Have you peer-reviewed (or read) any ML papers in your field? What do you typically pay attention to? What elements do you enjoy, and what concerns do you have?